Columbus
Scaling Large Language Model Training on Frontier with Low-Bandwidth Partitioning
Xu, Lang, Anthony, Quentin, Hatef, Jacob, Shafi, Aamir, Subramoni, Hari, K., Dhabaleswar, Panda, null
Scaling up Large Language Model(LLM) training involves fitting a tremendous amount of training parameters across a limited number of workers. However, methods like ZeRO-3 that drastically reduce GPU memory pressure often incur heavy communication to ensure global synchronization and consistency. Established efforts such as ZeRO++ use secondary partitions to avoid inter-node communications, given that intra-node GPU-GPU transfer generally has more bandwidth and lower latency than inter-node connections. However, as more capable infrastructure like Frontier, equipped with AMD GPUs, emerged with impressive computing capability, there is a need for investigations on the hardware topology and to develop targeted strategies to improve training efficiency. In this work, we propose a collection of communication and optimization strategies for ZeRO++ to reduce communication costs and improve memory utilization. In this paper, we propose a 3-level hierarchical partitioning specifically for the current Top-1 supercomputing cluster, Frontier, which aims at leveraging various bandwidths across layers of communications (GCD-GCD, GPU-GPU, and inter-node) to reduce communication overhead. For a 20B GPT model, we observe a 1.71x increase in TFLOPS per GPU when compared with ZeRO++ up to 384 GCDs and a scaling efficiency of 0.94 for up to 384 GCDs. To the best of our knowledge, our work is also the first effort to efficiently optimize LLM workloads on Frontier AMD GPUs.
An unsupervised method for MRI recovery: Deep image prior with structured sparsity
Sultan, Muhammad Ahmad, Chen, Chong, Liu, Yingmin, Gil, Katarzyna, Zareba, Karolina, Ahmad, Rizwan
Objective: To propose and validate an unsupervised MRI reconstruction method that does not require fully sampled k-space data. Materials and Methods: The proposed method, deep image prior with structured sparsity (DISCUS), extends the deep image prior (DIP) by introducing group sparsity to frame-specific code vectors, enabling the discovery of a low-dimensional manifold for capturing temporal variations. \discus was validated using four studies: (I) simulation of a dynamic Shepp-Logan phantom to demonstrate its manifold discovery capabilities, (II) comparison with compressed sensing and DIP-based methods using simulated single-shot late gadolinium enhancement (LGE) image series from six distinct digital cardiac phantoms in terms of normalized mean square error (NMSE) and structural similarity index measure (SSIM), (III) evaluation on retrospectively undersampled single-shot LGE data from eight patients, and (IV) evaluation on prospectively undersampled single-shot LGE data from eight patients, assessed via blind scoring from two expert readers. Results: DISCUS outperformed competing methods, demonstrating superior reconstruction quality in terms of NMSE and SSIM (Studies I--III) and expert reader scoring (Study IV). Discussion: An unsupervised image reconstruction method is presented and validated on simulated and measured data. These developments can benefit applications where acquiring fully sampled data is challenging.
log-RRIM: Yield Prediction via Local-to-global Reaction Representation Learning and Interaction Modeling
Hu, Xiao, Chen, Ziqi, Peng, Bo, Adu-Ampratwum, Daniel, Ning, Xia
Accurate prediction of chemical reaction yields is crucial for optimizing organic synthesis, potentially reducing time and resources spent on experimentation. With the rise of artificial intelligence (AI), there is growing interest in leveraging AI-based methods to accelerate yield predictions without conducting in vitro experiments. We present log-RRIM, an innovative graph transformer-based framework designed for predicting chemical reaction yields. Our approach implements a unique local-to-global reaction representation learning strategy. This approach initially captures detailed molecule-level information and then models and aggregates intermolecular interactions, ensuring that the impact of varying-sizes molecular fragments on yield is accurately accounted for. Another key feature of log-RRIM is its integration of a cross-attention mechanism that focuses on the interplay between reagents and reaction centers. This design reflects a fundamental principle in chemical reactions: the crucial role of reagents in influencing bond-breaking and formation processes, which ultimately affect reaction yields. log-RRIM outperforms existing methods in our experiments, especially for medium to high-yielding reactions, proving its reliability as a predictor. Its advanced modeling of reactant-reagent interactions and sensitivity to small molecular fragments make it a valuable tool for reaction planning and optimization in chemical synthesis. The data and codes of log-RRIM are accessible through https://github.com/ninglab/Yield_log_RRIM.
Vehicle-in-Virtual-Environment Method for ADAS and Connected and Automated Driving Function Development/Demonstration/Evaluation
Cao, Xincheng, Chen, Haochong, Aksun-Guvenc, Bilin, Guvenc, Levent
The current approach for new Advanced Driver Assistance System (ADAS) and Connected and Automated Driving (CAD) function development involves a significant amount of public road testing which is inefficient due to the number miles that need to be driven for rare and extreme events to take place, thereby being very costly also, and unsafe as the rest of the road users become involuntary test subjects. A new development, evaluation and demonstration method for safe, efficient, and repeatable development, demonstration and evaluation of ADAS and CAD functions called VehicleInVirtualEnvironment (VVE) was recently introduced as a solution to this problem. The vehicle is operated in a large, empty, and flat area during VVE while its localization and perception sensor data is fed from the virtual environment with other traffic and rare and extreme events being generated as needed. The virtual environment can be easily configured and modified to construct different testing scenarios on demand. This paper focuses on the VVE approach and introduces the coordinate transformations needed to sync pose (location and orientation) in the virtual and physical worlds and handling of localization and perception sensor data using the highly realistic 3D simulation model of a recent autonomous shuttle deployment site in Columbus, Ohio as the virtual world. As a further example that uses multiple actors, the use of VVE for VehicleToVRU communication based Vulnerable Road User (VRU) safety is presented in the paper using VVE experiments and real pedestrian(s) in a safe and repeatable manner. VVE experiments are used to demonstrate the efficacy of the method.
Thirsty Fabs
This year, Samsung is planning to open a semiconductor chip manufacturing plant in Taylor, TX, that will cost the company an estimated 17 billion. Intel is building a 20-billion facility in Columbus, OH, and industry leaders GlobalFoundries, TSMC, and Texas Instruments are building their own so-called chip fabs in the U.S. as well. This construction boom has been spurred in part by increasing demand for the smartphones, personal electronic devices, and Artificial Intelligence (AI) services that depend on chips, and the 50 billion in funding that the 2022 CHIPS and Science Act allocated to American semiconductor manufacturing has proven to be a strong incentive. Yet the boom is global, with new plants being developed all over the world. As companies plan these new chip fabs, one of the first questions they need to answer is where they are going to get their water.
A Fair and In-Depth Evaluation of Existing End-to-End Entity Linking Systems
Bast, Hannah, Hertel, Matthias, Prange, Natalie
Existing evaluations of entity linking systems often say little about how the system is going to perform for a particular application. There are two fundamental reasons for this. One is that many evaluations only use aggregate measures (like precision, recall, and F1 score), without a detailed error analysis or a closer look at the results. The other is that all of the widely used benchmarks have strong biases and artifacts, in particular: a strong focus on named entities, an unclear or missing specification of what else counts as an entity mention, poor handling of ambiguities, and an over- or underrepresentation of certain kinds of entities. We provide a more meaningful and fair in-depth evaluation of a variety of existing end-to-end entity linkers. We characterize their strengths and weaknesses and also report on reproducibility aspects. The detailed results of our evaluation can be inspected under https://elevant.cs.uni-freiburg.de/emnlp2023 . Our evaluation is based on several widely used benchmarks, which exhibit the problems mentioned above to various degrees, as well as on two new benchmarks, which address the problems mentioned above. The new benchmarks can be found under https://github.com/ad-freiburg/fair-entity-linking-benchmarks .
Exploring the Potential of AI-Generated Synthetic Datasets: A Case Study on Telematics Data with ChatGPT
This research delves into the construction and utilization of synthetic datasets, specifically within the telematics sphere, leveraging OpenAI's powerful language model, ChatGPT. Synthetic datasets present an effective solution to challenges pertaining to data privacy, scarcity, and control over variables - characteristics that make them particularly valuable for research pursuits. The utility of these datasets, however, largely depends on their quality, measured through the lenses of diversity, relevance, and coherence. To illustrate this data creation process, a hands-on case study is conducted, focusing on the generation of a synthetic telematics dataset. The experiment involved an iterative guidance of ChatGPT, progressively refining prompts and culminating in the creation of a comprehensive dataset for a hypothetical urban planning scenario in Columbus, Ohio. Upon generation, the synthetic dataset was subjected to an evaluation, focusing on the previously identified quality parameters and employing descriptive statistics and visualization techniques for a thorough analysis. Despite synthetic datasets not serving as perfect replacements for actual world data, their potential in specific use-cases, when executed with precision, is significant. This research underscores the potential of AI models like ChatGPT in enhancing data availability for complex sectors like telematics, thus paving the way for a myriad of new research opportunities.
Using Collision Momentum in Deep Reinforcement Learning Based Adversarial Pedestrian Modeling
Chen, Dianwei, Yurtsever, Ekim, Redmill, Keith, Ozguner, Umit
Recent research in pedestrian simulation often aims to develop realistic behaviors in various situations, but it is challenging for existing algorithms to generate behaviors that identify weaknesses in automated vehicles' performance in extreme and unlikely scenarios and edge cases. To address this, specialized pedestrian behavior algorithms are needed. Current research focuses on realistic trajectories using social force models and reinforcement learning based models. However, we propose a reinforcement learning algorithm that specifically targets collisions and better uncovers unique failure modes of automated vehicle controllers. Our algorithm is efficient and generates more severe collisions, allowing for the identification and correction of weaknesses in autonomous driving algorithms in complex and varied scenarios.
Hardware-in-the-Loop and Road Testing of RLVW and GLOSA Connected Vehicle Applications
Kavas-Torris, Ozgenur, Cantas, Mustafa Ridvan, Gelbal, Sukru Yaren, Guvenc, Levent
This paper presents an evaluation of two different Vehicle to Infrastructure (V2I) applications, namely Red Light Violation Warning (RLVW) and Green Light Optimized Speed Advisory (GLOSA). The evaluation method is to first develop and use Hardware-in-the-Loop (HIL) simulator testing, followed by extension of the HIL testing to road testing using an experimental connected vehicle. The HIL simulator used in the testing is a state-of-the-art simulator that consists of the same hardware like the road side unit and traffic cabinet as is used in real intersections and allows testing of numerous different traffic and intersection geometry and timing scenarios realistically. First, the RLVW V2I algorithm is tested in the HIL simulator and then implemented in an On-Board-Unit (OBU) in our experimental vehicle and tested at real world intersections. This same approach of HIL testing followed by testing in real intersections using our experimental vehicle is later extended to the GLOSA application. The GLOSA application that is tested in this paper has both an optimal speed advisory for passing at the green light and also includes a red light violation warning system. The paper presents the HIL and experimental vehicle evaluation systems, information about RLVW and GLOSA and HIL simulation and road testing results and their interpretations.
Discrete-time Robust PD Controlled System with DOB/CDOB Compensation for High Speed Autonomous Vehicle Path Following
Autonomous vehicle path following performance is one of significant consideration. This paper presents discrete time design of robust PD controlled system with disturbance observer (DOB) and communication disturbance observer (CDOB) compensation to enhance autonomous vehicle path following performance. Although always implemented on digital devices, DOB and CDOB structure are usually designed in continuous time in the literature and also in our previous work. However, it requires high sampling rate for continuous-time design block diagram to automatically convert to corresponding discrete-time controller using rapid controller prototyping systems. In this paper, direct discrete time design is carried out. Digital PD feedback controller is designed based on the nominal plant using the proposed parameter space approach. Zero order hold method is applied to discretize the nominal plant, DOB and CDOB structure in continuous domain. Discrete time DOB is embedded into the steering to path following error loop for model regulation in the presence of uncertainty in vehicle parameters such as vehicle mass, vehicle speed and road-tire friction coefficient and rejecting external disturbance like crosswind force. On the other hand, time delay from CAN bus based sensor and actuator command interfaces results in degradation of system performance since large negative phase angles are added to the plant frequency response. Discrete time CDOB compensated control system can be used for time delay compensation where the accurate knowledge of delay time value is not necessary. A validated model of our lab Ford Fusion hybrid automated driving research vehicle is used for the simulation analysis while the vehicle is driving at high speed. Simulation results successfully demonstrate the improvement of autonomous vehicle path following performance with the proposed discrete time DOB and CDOB structure.